A Multilingual Approach to Building Slovene Wordnet

نویسنده

  • Darja Fišer
چکیده

The paper presents an experiment in which synsets for Slovene wordnet were induced automatically from several multilingual resources. Our research is based on the assumption that translations are a plausible source of semantically relevant information. More specifically, we argue that the translational relation on the one hand reduces ambiguity of a source word and on the other conveys semantic relatedness of a set of target words. We tried to identify sense distinctions of polysemous words and obtain sets of synonyms by first extracting multilingual lexicons from a word-aligned JRC-Acquis parallel corpus and then comparing them with the already existing wordnets in various languages. At this stage, lexicon entries were disambiguated and appropriate synset ids were assigned to their Slovene translation equivalents. Finally, the Slovene lexicon entries sharing the same assigned synset id were organized into a synset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Multilingual Resources for Building SloWNet Faster

This project report presents the results of an approach in which synsets for Slovene wordnet were induced automatically from parallel corpora and already existing wordnets. First, multilingual lexicons were obtained from word-aligned corpora and compared to the wordnets in various languages in order to disambiguate lexicon entries. Then appropriate synset ids were attached to Slovene entries fr...

متن کامل

Enriching Slovene WordNet with domain-specific terms

The paper describes an innovative approach to expanding the domain coverage of wordnet by exploiting multiple resources. In the experiment described here we are using a large monolingual Slovene corpus of texts from the domain of informatics to harvest terminology from, and a parallel English-Slovene corpus and an online dictionary as bilingual resources to facilitate the mapping of terms to th...

متن کامل

Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnet...

متن کامل

sloWNet: construction and corpus annotation

This paper presents a wordnet for Slovene which was created semi-automatically with a combination of approaches and multilingual resources, in particular a bilingual dictionary, a parallel corpus and Wikipedia. Analysis of the results shows that the dictionary approach yields a good core wordnet but requires substantial manual editing due to a lack of automatic word-sense disambiguation. This w...

متن کامل

Building Slovene WordNet

A WordNet is a lexical database in which nouns, verbs, adjectives and adverbs are organized in a conceptual hierarchy, linking semantically and lexically related concepts. Such semantic lexicons have become one of the most valuable resources for a wide range of NLP research and applications, such as semantic tagging, automatic word-sense disambiguation, information retrieval and document summar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007